Generating abbreviations using Google Books library

نویسندگان

  • Valery D. Solovyev
  • Vladimir V. Bochkarev
چکیده

The article describes the original method of creating a dictionary of abbreviations based on the Google Books Ngram Corpus. The dictionary of abbreviations is designed for Russian, yet as its methodology is universal it can be applied to any language. The dictionary can be used to define the function of the period during text segmentation in various applied systems of text processing. The article describes difficulties encountered in the process of its construction as well as the ways to overcome them. A model of evaluating a probability of first and second type errors (extraction accuracy and fullness) is constructed. Certain statistical data for the use of abbreviations are provided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analytical Study of Online Public Access Catalogues in Comparison with Features of Amazon and Google: A Checklist Approach

Recent researches in the field of cataloguingconfirmed that the library catalogue is losing its importance and its service cannot match the level of Google, Google Scholar, Google Books and Amazon. Ultimately it made the library users to bypass the library catalogue for their information requisites. The scenario asserts the need of major shifts in cataloguing in various aspects and compels to e...

متن کامل

Exploring the Nature of the Smart Cities Research Landscape

As a research domain, Smart Cities is only emerging. This is evident from the number of publications, books, and other scholarly articles on smart cities indexed in Google scholar and Elsevier’s Scopus—an abstract and citation database. However, significant literature is available on related topics like intelligent city, digital city, and intelligent community based on search results research r...

متن کامل

Peachnote: Music Score Search and Analysis Platform

Hundreds of thousands of music scores are being digitized by libraries all over the world. In contrast to books, they generally remain inaccessible for content-based retrieval and algorithmic analysis. There is no analogue to Google Books for music scores, and there exist no large corpora of symbolic music data that would empower musicology in the way large text corpora are empowering computati...

متن کامل

Generating a dictionary of control models for event extraction

A subordination dictionary is important in a number of text processing applications. We present a method for generating such dictionary for Russian verbs using Google Books Ngram data. An intended purpose of the dictionary is an event extraction system for Russian that uses the dictionary to define extraction patterns.

متن کامل

Alternative Metrics for Book Impact Assessment: Can Choice Reviews be a Useful Source?

This article assesses whether academic reviews in Choice: Current Reviews for Academic Libraries could be systematically used for indicators of scholarly impact, uptake or educational value for scholarly books. Based on 451 Choice book reviews from 2011 across the humanities, social sciences and science, there were significant but low correlations between Choice ratings and citation and non-cit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1410.1080  شماره 

صفحات  -

تاریخ انتشار 2014